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Abstract. One of the key science projects of the Low-Frequency Array (LOFAR) 
is the detection of the cosmological signal coming from the Epoch of Reionization 
(EoR). Here we present the LOFAR EoR Diagnostic Database (LEDDB) that is used in 
the storage, management, processing and analysis of the LOFAR EoR observations. It 
stores referencing information of the observations and diagnostic parameters extracted 
from their calibration. This stored data is used to ease the pipeline processing, mon- 
itor the performance of the telescope and visualize the diagnostic parameters which 
facilitates the analysis of the several contamination effects on the signals. It is imple- 
mented with PostgreSQL and accessed through the psycopg2 python module. We have 
developed a very flexible query engine, which is used by a web user interface to access 
the database, and a very extensive set of tools for the visualization of the diagnostic 
parameters through all their multiple dimensions. 



1. Introduction 

The Low-Frequency Array (LOFAR) is an antenna array that observes at low radio 
frequencies (10 - 240 MHz). It consists of about 70 stations spread around Europe that 
combine their signals to form an interferometric aperture synthesis array (van Haarlem 
et al. in preparation). The LOFAR Epoch of Reionization (EoR) experiment is one of 
the key science projects (KSP) of LOFAR. It aims to study the redshifted 21-cm line of 
neutral hydrogen from the Epoch of Reionization (de Bruyn et al. in preparation). There 
are many challenges that need to be overcome in order to meet this goal including strong 
astrophysical foreground contamination, ionospheric distortions, complex instrumental 
response and different types of noise. The very faint signals from neutral hydrogen 
require hundreds of hours of observation thereby accumulating petabytes of data. To 
diagnose and monitor the various instrumental and ionospheric parameters, as well as 
manage the data, we have developed the LEDDB (LOFAR EoR Diagnostic Database). 
Its main tasks and uses are: 

• To store referencing information of the observations, mainly the locations of the 
data but also other indexing information. 

• To store diagnostic parameters of the observations extracted through calibration. 

• To facilitate efficient data management and pipeline processing. 
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• To monitor the perfomiance of the telescope as a function of date. 

• To visualize the diagnostic parameters. For example we can observe the complex 
gain of all the stations as a function of time and frequency to visualize ionospheric 
distortion affecting large part of the array. 



2. Data flow 

The data from the stations is sent to the Central Processing Facility (CEP) located in 
Groningen (the Netherlands), where it is correlated among other processing steps. Af- 
terwards, the data is stored in the Long Term Archive (LTA) in Groningen. From the 
LTA we copy the data to the LOFAR EoR CPU/GPU cluster, also in Groningen, where 
we process it with the LOFAR EoR pipeline. The LEDDB takes care of storing the 
locations of the data both in the LTA and the LOFAR EoR cluster. It also stores all 
the diagnostic data produced by the pipeline. Since we can not keep all the data in the 
LOFAR EoR cluster, we must archive it in the LTA but thanks to the LEDDB we retain 
access to all its diagnostic information. 



3. Database definition 

The LEDDB is implemented with PostgreSQL and accessed through a python interface 
provided by the psycopg2 module. It is part of a research project with still evolving 
requirements, so one of the key points of the design was to make it flexible enough to 
meet new requirements such as the addition of new diagnostic parameters. The content 
of the database is categorized under three different blocks: the referencing information, 
the diagnostic data and the meta-data. In figure \T\ we show the Entity-Relationship 
diagram of the database with its blocks, the tables involved and their relationships. 

(1) The referencing information block ("REF" in figure [B contains five primary 
tables: LOFAR_DATASET (LDS), LOFAR_DATASET_BEAM (LDSB), LOFAR_DATA- 
SET .BEAM .PRODUCT (LDSBP), MEASUREMENTSET (MS) and MEASUREMENT- 
SET_PRODUCT (MSP). They contain information about the observations: their names, 
date and time information, the pointed fields and other indexing information. They also 
store the locations of the data, i.e., the host and cluster the data is in and the path to the 
files. The rest of tables in this block are the secondary tables which are only used to 
ease the selection on the primary ones. 

(2) The diagnostic data block ( "DIAG " in figure [1} contains the diagnostic pa- 
rameters related to the observations. There are four primary tables: the GAIN table 
and three QUALITY tables. They store the gain solutions of the stations and baseline- 
based, frequency-based and time-based statistic parameters of the data. There is also a 
secondary table in this block called QUALITY JCIND. 

(3) Finally the meta-data block ("META" in figure [B stores information regarding 
the relationships of the referencing section and the diagnostic data. Each one of the 
referencing tables is joined with each related meta-data table. 

The LEDDB can generate a RefFile or a DiagFile. A RefFile is a file containing lo- 
cations of data related to the observations. This file is used in the LOFAR EoR pipeline 
processing tasks. On the other hand, a DiagFile contains references to diagnostic data 
in the LEDDB. 
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Figure 1. Entity-Relationship diagram of the LEDDB. Only table names and key 
columns are shown. 



4. Diagnostic data analysis 



The diagnostic data can have multiple dimensions: Time, frequency, baseline (interfer- 
ometer), station, polarization correlations and other ones depending on the situation. 
In general they are complex numbers. We provide plotting and animation tools imple- 
mented with matplotlib to analyse such multi-dimensional data. In figure |2] we show an 
example of one of the produced plots. 




Figure 2. Gain as a function of time of one of the polarization auto-correlations 
of two different stations at 138 MHz for the observation L60639 (Elais field). Note 
the phase difference between a core station (CSOOIHBAO) and a remote station 
(RS508HBA), mainly caused by the ionosphere. 



5. Query engine and User Interface 



The query engine is a python API which provides fast and flexible access to the database. 
We use a python based web server (cherrypy) to interface with the query engine. The 
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client-side user interface (UI) in the web page is implemented with JQueryUI frame- 
work. In figure |3] we show a snapshot of the web UI. 
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Figure 3. Snapshot of the web UI. Each tab in the UI represents a primary table 
in the database. 



We estimate that 10 terabytes of diagnostic data will be stored in the LEDDB 
for the full LOFAR EoR KSP (currently it is 75 gigabytes). In addition to the size 
challenge, the number of rows of some of the tables is the most important aspect to be 
taken into account in the design of the database and its query engine, and it is actually 
the main bottleneck in the queries. We have managed to provide a fast access thanks 
to efficient table indexing, the minimization of the number of join operations and the 
use of persistent connections eased by the session handling provided by the cherrypy 
framework. 

The query engine provides functionality to sort, filter by column values and by 
selection in primary and secondary tables. This is used by the UI to provide a very 
extensive set of options for accessing the data. 

The UI allows the user to create both RefFiles and DiagFiles. Besides, this UI can 
be used to launch pipeline jobs with a RefFile and directly plot diagnostic data with a 
DiagFile. 



6. Future developments 

We will focus on minimizing the access times while the database is growing and im- 
proving the tools to analyse the diagnostic parameters. Possibly new diagnostic param- 
eters will added. There is also a plan to migrate the database to a new server specially 
designed for its purpose. 
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